Sparsification Strategies in Latent Semantic Indexing
نویسندگان
چکیده
The text retrieval method using Latent Semantic Indexing (LSI) with the truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The term-document matrices after SVD are full matrices, although the rank is reduced substantially. To reduce memory consumption, we examine some strategies to sparsify the truncated SVD matrices. After applying the sparsification strategies to three popular document databases, we find that some of our strategies not only sparsify the SVD matrices, but may also increase the accuracy of the text retrieval in some cases.
منابع مشابه
Assessing the Impact of Sparsification on LSI Performance
We describe an approach to information retrieval using Latent Semantic Indexing (LSI) that directly manipulates the values in the Singular Value Decomposition (SVD) matrices. We convert the dense term by dimension matrix into a sparse matrix by removing a fixed percentage of the values. We present retrieval and runtime performance results, using seven collections, which show that using this tec...
متن کاملClustered SVD strategies in latent semantic indexing q
The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کاملClustered SVD strategies in latent semantic indexing
The text retrieval method using Latent Semantic Indexing (LSI) technique with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کاملUsing Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis
We present results from using Random Indexing for Latent Semantic Analysis to handle Singular Value Decomposition tractability issues. We compare Latent Semantic Analysis, Random Indexing and Latent Semantic Analysis on Random Indexing reduced matrices. In this study we use a corpus comprising 1003 documents from the MEDLINE-corpus. Our results show that Latent Semantic Analysis on Random Index...
متن کاملDistributional Semantics Approach to Thai Word Sense Disambiguation
Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...
متن کامل